Heterogeneity in the gene regulatory landscape of leiomyosarcoma

Abstract Characterizing inter-tumor heterogeneity is crucial for selecting suitable cancer therapy, as the presence of diverse molecular subgroups of patients can be associated with disease outcome or response to treatment. While cancer subtypes are often characterized by differences in gene expression, the mechanisms driving these differences are generally unknown. We set out to model the regulatory mechanisms driving sarcoma heterogeneity based on patient-specific, genome-wide gene regulatory networks. We developed a new computational framework, PORCUPINE, which combines knowledge on biological pathways with permutation-based network analysis to identify pathways that exhibit significant regulatory heterogeneity across a patient population. We applied PORCUPINE to patient-specific leiomyosarcoma networks modeled on data from The Cancer Genome Atlas and validated our results in an independent dataset from the German Cancer Research Center. PORCUPINE identified 37 heterogeneously regulated pathways, including pathways representing potential targets for treatment of subgroups of leiomyosarcoma patients, such as FGFR and CTLA4 inhibitory signaling. We validated the detected regulatory heterogeneity through analysis of networks and chromatin states in leiomyosarcoma cell lines. We showed that the heterogeneity identified with PORCUPINE is not associated with methylation profiles or clinical features, thereby suggesting an independent mechanism of patient heterogeneity driven by the complex landscape of gene regulatory interactions.


INTRODUCTION
Soft-tissue sarcomas are a group of rare and highly aggressi v e malignancies. While they account for less than 1% of all malignant tumors, soft-tissue sarcomas are a tremendously heter ogeneous gr oup of tumors and include > 150 different histological subtypes ( 1 ). Partly because of this heterogeneity, significant challenges exist in the management of soft-tissue sarcomas. Most soft-tissue sarcomas are treated similarly in the clinic, regardless of their site of origin, with surgery with or without r adiother apy as the main treatment for localized disease ( 2 ). Se v eral clinical trials have been conducted in soft-tissue sarcomas. Howe v er, until recently such trials included patients with many different histological subtypes in the same cohort, causing difficulties to conclude on the efficacy of these therapies in the individual subtypes. Differences in clinical response among soft-tissue sarcoma subtypes led to newer studies that only enrolled patients of certain histological subtypes, which have shown to result in better response and disease control ( 3 ).
Over the past years it has become evident that treatments tailored to a single patient, or group of patients belonging to a specific molecular subtype of cancer, can result in major improvements in cancer outcomes ( 4 ). For example, characterizing inter-patient molecular tumor heterogeneity was shown to be crucial for selecting the most efficient cancer therapy, and the presence of di v erse molecular subtypes can predict patient survival in breast cancer ( 5 ) and relapse or resistance to treatment in melanoma ( 6 ). Ther efor e, it is clear that the integration of personalized medicine into cancer trea tment stra tegies r equir es extensi v e knowledge of inter-pa tient variability. Pa tients can, for example, be grouped into molecular subtypes based on 'omics' data, such as gene expression, microRNA, DNA methyla tion, soma tic muta tions, or pr oteomic pr ofiles.
The molecular landscape of soft-tissue sarcomas has been characterized in se v eral studies (7)(8)(9)(10). The Cancer Genome Atlas (T CGA) sar coma project, one of the largest sar coma sequencing projects to-date, performed a comprehensi v e and integrated analysis of 206 adult soft-tissue sarcomas, r epr esented by six major subtypes, and showed that sarcomas vary greatly at the genetic, epigenetic, and transcriptomic le v els ( 7 ). More recently, some histological subtypes of soft-tissue sarcomas were further delineated into molecular subgroups according to their genomic and transcriptomic profiles. For example, Guo et al. , characterized three molecular subtypes of leiomyosarcoma (LMS) --one of the most common subtypes of soft-tissue sarcomas --based on transcriptomic data ( 11 ). One of these subtypes was overr epr esented by uterine leiomyosarcoma, while the other tw o were o ver-represented by extra-uterine sites. While these subtypes were not associated with tumor grade, they were somewhat related to patient survi val. Howe v er, the causati v e regulatory mechanisms that distinguish these subtypes are not fully understood and the impact of molecular profiling of soft-tissue sarcomas on patient outcomes has been limited.
Through the modeling of interactions between transcription factors (TFs) or other regulators and their potential target genes, gene regulatory networks offer an in-depth view on the mechanisms that dri v e gene e xpression ( 12 ), and thus could help gain greater insight into disease mechanisms. Various integrati v e methods hav e been de v eloped to model such networks genome-wide. One such method is PANDA, which integrates putati v e TF-DNA binding with pr otein-pr otein interactions and target gene co-expression to infer a regulatory network for a specific condition ( 13 ). Recently, we de v eloped an algorithm that can be combined with condition-specific network models estimated with e.g. PANDA to infer pa tient-specific regula tory networks (LI-ONESS ( 14 )). These patient-specific network models have been instrumental in capturing sex differences in gene regulation in healthy tissues ( 15 ) and colon cancer ( 16 ), as well as in identifying regulatory interactions associated with gliob lastoma survi val ( 17 ).
In this work, we demonstra te tha t analysis of heterogeneity among patient-specific gene regulatory networks can facilitate stratification of cancer patients into novel subtypes and identification of the regulatory programs that dri v e such heterogeneity. To characterize regulatory heterogeneity, we present a new computational approach, PORCUPINE ( P rincipal Components Analysis to O btain R egulatory C ontributions U sing P athway-based I nterpretation of N etwork E stimates). PORCUPINE can be applied to patient-specific networks modeled with PANDA and LIONESS and detects statistically significant, key regulatory pathways that dri v e regulatory heterogeneity among patients. In specific, we applied the method to 80 genome-wide leiomyosarcoma regulatory networks, which we modeled on data from T CGA (r eferr ed to below as TCGA-LMS).
We validated the pathways detected by PORCUPINE in an independent dataset r epr esented by 37 cases of leiomyosarcoma from the study by Chudasama et al. (referred to below as DKFZ-LMS) ( 18 ). We found high concordance in regulatory heterogeneity in both cohorts, identifying 37 shared hetero geneousl y regulated pathways. These included pathways that play a known role in leiomyosarcoma biology and pa thways tha t have not been described before in the disease. Newly identified pathways include FGFR signaling and CTLA4 inhibitory signaling and r epr esent potential targets for tr eatment of subgroups of leiomyosarcoma patients. We validated the detected regulatory heterogeneity through analysis of networks and chroma tin sta tes in leiomyosar coma cell lines. Mor eover, we show that the heterogeneity identified with PORCUPINE is not associated with methylation profiles or clinical featur es, ther eby suggesting an independent mechanism of patient heterogeneity dri v en by the comple x landscape of gene regulatory interactions.

Gene expression data preprocessing
We downloaded expression data for all TCGA cases using the 'recount' package in R ( 19 ). The transcriptome data for 37 leiomyosarcoma cases obtained from the German Cancer Research Center (DKFZ) was preprocessed by the Omics IT and Data Management Core Facility (DKFZ ODCF) using the One Touch Pipeline ( 20 ). We performed batch correction on the raw expression counts of the set of 206 T CGA soft-tissue sar comas and the 37 DKFZ-LMS samples together, using the 'sva' package v3.35.2 in Bioconductor in R 3.6.1 ( 21 ). We then combined Combatseq-adjusted counts with the raw expression counts of the r emaining T CGA samples and performed smooth quantile normalization using 'qsmooth' package in Bioconductor to pr eserve global differ ences in gene expr ession between the different cancer types ( 22 ), specifying each cancer type as a separate group le v el. Samples of 206 TCGA soft-tissue sarcomas and 37 DKFZ-LMS samples were specified as the same 'soft-tissue sarcoma' group le v el (Supplementary Figure S1).

Construction of individual patient gene regulatory networks
We used the MATLAB version of the PANDA network reconstruction algorithm (available in the netZoo repository https://github.com/netZoo/netZooM ) to estimate an 'aggr egate' gene r egulatory network, based on a total of 11,321 samples , 17,899 genes , and 623 TFs. These samples included 206 TCGA and 37 DKFZ soft-tissue sarcomas --the remaining samples r epr esented other cancer types available in TCGA. We used the entire TCGA dataset to build the aggregate network, as we previously found that LIONESS' estimates of single-sample edges are more robust when including a large, heterogeneous background of samples ( 14 ).
PANDA builds an aggregate network by incorporating three types of data --a 'prior' regulatory network, which is based on a TF motif scan to identify putati v e regulatory interactions between TFs and their target genes, proteinprotein (PPI) interactions between TFs, and target gene expression data. The aggregate network modeled by PANDA consists of weighted edges between each TF-target gene pair. These edge weights reflect the strength of the inferred r egulatory r elationship.
The prior gene regulatory network was generated using a set of TF motifs obtained from the Catalogue of Inferred Sequence Binding Pr efer ences (CIS-BP) ( 23 ), as described by Sonawane et al. ( 24 ). These motifs were scanned to promoters as described previously ( 25 ). The prior network was intersected with the expression data to include genes and TFs with available expression data and at least one significant promoter hit. This resulted in initial map representing potential regulatory interactions between 623 TFs and 17,899 target genes. An initial pr otein-pr otein network was estimated between all TFs from motif prior map using interaction scores from StringDb v10 ( 26 ), which were scaled to be within a range of [0,1], where self-interactions were set equal to one, as described previously ( 24 ). To reconstruct pa tient-specific gene regula tory networks, we applied the LIONESS equation in MATLAB (available in the netZoo repository https://github.com/netZoo/netZooM ).

UMAP visualization
To visualize the clustering distribution of the 206 TCGA soft-tissue sarcoma patient-specific gene regulatory networks, we applied dimensionality reduction with Uniform Manifold Approximation and Projection (UMAP), using the 'uwot' package v0.1.5 in R 3.6.1, setting the number of nearest neighbours to 20. We performed UMAP on the matrix of gene targeting scores obtained from the 206 individual sarcoma networks. Gene targeting scor es ar e defined as the sum of all edge weights pointing to a gene and r epr esent the amount of regulation a gene recei v es from the entire set of TFs available in a network ( 27 ). These scores have previously been used to identify gene regulatory differences in various studies ( 16 , 17 , 27 ). Additionally, we performed visualization of the distribution of 80 T CGA leiomyosar coma samples based on gene targeting scores and expression in two-dimensional UMAP space. To compare the neighbourhood structures between the UMAPs, we used the approach described in the study by Taskesen et al. ( 28 ).

Identifying regulatory heterogeneity using PORCUPINE
To capture inter-patient heterogeneity (r eferr ed to below as 'heterogeneity') at the gene regulatory le v el, we de v eloped a computational frame wor k, which we call PORCUPINE. PORCUPINE is a Principal Components Analysis (PCA)based approach that can be used to identify key pathways that dri v e heterogeneity among indi viduals in a dataset. It determines whether a specific set of variables --for example a set of genes in a specific pathway --have coordinated variability in their regulation. PORCUPINE uses as input individual patient networks, for e xample networ ks modeled using PANDA and LI-ONESS, as well as a .gmt file (in MSigDB file format ( 29 )) that includes biological pathways and the genes belonging to them. For each pathway, it then extracts all regulatory network edges connected to the genes belonging to that pathway. It then scales each edge across individuals with a z-score transformation, so that edges have zero mean and unit variance. It then performs a PCA analysis on these edge w eights, as w ell as on a null background that is based on random pathways. For the randomization (permutation), PORCUPINE creates a set of 1,000 gene sets equal in size to the pathway of inter est, wher e genes ar e randomly selected from all genes present in the .gmt file. The edges connected to these genes are then extracted. The amount of variance explained by the first principal component (PC1) in the pathway of interest is then compared to the amount of variance explained by PC1 in the random (permuted) data.
To identify significant pathways, PORCUPINE applies a one-tailed t-test and calculates the effect size (ES). The latter is calculated as the difference between the variance explained by PC1 of the pathway of interest and the mean of the variance explained by PC1 corresponding to the random sets of pathways, divided by standard deviation of the v ariance explained b y PC1 in the random sets using the cohensD function in the 'lsr' package in R. P -values are adjusted for multiple testing with the Benjamini-Hochberg method ( 30 ) and significant pathways ar e r eturned based on user-defined thresholds on the FDR-adjusted P -value and effect size. We de v eloped PORCUPINE as R package and it is available as open-source code on GitHub ( https: //github.com/kuijjerlab/PORCUPINE ).
We applied PORCUPINE to TCGA and DKFZ leiomyosarcoma data using Reactome pathways from MSigDB v7.1, excluding pa thways tha t consisted of > 200 genes. Pathways with adjusted P -value < 0.01, explained variance ≥10%, and effect size ≥2 were reported as significant. As the number of genes in each pathway is different, we investigated whether the obtained results were biased tow ards pathw ays of smaller size. To test this, we split pathways in four groups based on their size, namely pathways containing less than 50, 50-100, 100-150, 150-200 genes. We then calculated the proportions of these groups among Reactome pathways and among the set of deregulated pathways identified in the TCGA-LMS and DKFZ-LMS datasets.

Identification of top ranked target genes and transcription factors
To identify those genes and TFs that contribute most to the pathway's significance, we extracted the edge loadings of the first principal component (r eferr ed to below as the 'edge contribution score'). Since the sum of the squares of all edge contribution scores for an individual principal component is equal to one by definition, and assuming that all edges contribute equally to that principal component, we can calculate the expected edge contribution score. Edges with a contribution score > 1.5 × the expected score were regarded as important contributors to that principal component. To identify TFs with many co-regulated genes, we then grouped TFs corresponding to these top edges according to the number of their targets.

Association of the significant pathways with clinical phenotypes
In PORCUPINE, each pa thway separa tes pa tients along a specific PC axis, which r epr esents the position of individuals along that axis. Below, we refer to these positions as 'pa thway-based pa tient heterogeneity scores.' To investiga te whether the heterogeneity captured by each pathway was associated with clinical features, we performed an association analysis of the pathway-based heterogeneity scores on the first principal component with the clinical data available for these patients.
Clinical features for TCGA leiomyosarcoma patients were obtained using the 'TCGAbiolinks' package from Bioconductor ( 31 ). Clinical inf ormation f or 37 DKFZ patients was obtained from the study by Chudasama et al. ( 18 ). Since the clinical attributes r epr esent a mix of categorical and numerical features, we applied Kruskal-Wallis and Pearson correlation tests for categorical and numerical featur es, r especti v ely. We corrected P -values for multiple testing using the Benjamini-Hochberg approach and applied a threshold of 0.05 to identify significant associations.
In order to determine whether any of the identified pathways were associated with patient survival, we used the first principal component from each pathway in a Cox r egr ession model to predict patient survival.

Identification of molecular subtypes based on the identified pathways
To cluster a population of patients based on the identified pathways into discrete subtypes, K -means clustering can be applied on the pa thway-based pa tient heterogeneity scores on the first two principal components obtained from a pathway. The optimal number of clusters can be determined prior to clustering using the Average Silhouette Method ( 32 ).

Association of the significant pathways with pathway-based mutation profiles
We downloaded and preprocessed leiomyosarcoma mutation data as previously described in Kuijjer et al. ( 33 ). We used the SAMBAR algorithm ( 33 ) to obtain patient-specific pa thway muta tion scor es for T CGA-LMS patients. Among 1,455 pathways, 954 pathways had mutation scores larger than zero in the TCGA-LMS dataset. To assess the association between pathways identified with PORCUPINE and these pa thways' muta tion scores, we used a Kruskal-Wallis test, comparing the pathway-based patient heterogeneity scores on the first principal component between two groups, i.e. mutated vs not mutated, for each muta ted pa thway. We used FDR-adjusted P -value < 0.05 as threshold for reporting significant differences between the groups.

Association of the identified pathways with o ver all methylation profiles
DNA methyla tion da ta measured on the Illumina Infinium Human Methylation 450 BeadChip platform were down-loaded for all sarcoma patients available in TCGA using the Bioconductor 'TCGA biolinks' package in R. We downloaded raw methylation IDAT files and performed preprocessing and normalization with subset-quantile within array normalization (SWAN) using Bioconductor package 'minfi.' ( 34 ). We calculated overall methylation profiles for each individual by using the mean value across all probes. We then correlated these values to the pathway-based patient heterogeneity scores on the first principal component in each pa thway. Associa tions with FDR-adjusted P -value < 0.05 were considered significant.

Validation of the pathways in healthy tissues
We obtained pa tient-specific regula tory networks for healthy smooth-muscle-deri v ed tissues, r epr esented by esophageal muscularis and uterus from the Genotype-Tissue Expression (GTEx) pr oject, thr ough the GRAND database of gene regulatory network models ( 35 ). In total, 283 and 90 patient-specific networks were available for esophageal muscularis and uterus, respecti v ely. We applied PORCUPINE to evaluate gene regulatory heterogeneity among the individuals in the merged set of 373 networks.

Construction of gene regulatory networks for leiomyosarcoma cell lines
RNA-seq counts were obtained for four leiomyosarcoma cell lines, including SK-UT-1, SK-UT1-B, MES-SA and SK-LMS-1 from the study by Chudasama et al. ( 18 ). To integrate these data with the patient samples, we performed batch correction on the raw expression counts of the set of 206 T CGA soft-tissue sar comas and the 37 DKFZ-LMS samples and the four cell lines together, using the 'Combatseq' package in Bioconductor. Following that, we combined the Combatseq-adjusted counts with the raw expression counts of the r emaining T CGA samples and used 'qsmooth' normalization to obtain normalized counts. Individual networks for cell lines were then modeled using PANDA and LIONESS as described above.

Generation and processing of A T A C-seq data
A complete list of all rea gents, b uffer solutions, and DNA barcode primer sequences is described in Supplementary File 1. ATAC-Seq libraries for SK-UT -1, SK-UT -1B, MES-SA, and SK-LMS-1 were prepared in triplicate according to the Omni-ATAC protocol ( 36 ) with minor modifications. Briefly, 50,000 cells per replicate were collected by centrifuga tion a t 500 × g for 5 min a t 4 • C . Cell pellets wer e r esuspended in 50 l ice-cold lysis buf fer A and incuba ted on ice for 3 min, after which 500 l ice-cold lysis buffer B were added. Cell nuclei were pelleted by centrifugation at 500 × g for 10 min at 4 • C. The supernatant was removed carefully and nuclei pellets were resuspended in 47.5 l ice-cold transposition buffer and 2.5 l Tagment DNA TDE1 enzyme (Illumina). The transposition mix was incubated at 37 • C for 30 mins at 1,000 rpm. After adding 20 l 5 M guanidinium thiocyanate, tagmented DNA was purified using Agencourt AMPure XP magnetic beads (Beckman Coulter).
NAR Cancer, 2023, Vol. 5, No. 3 5 Sequencing libraries were generated via qPCR by mixing purified tagmented DNA with 25 l 2X NEBNext High-Fidelity PCR Master Mix (NEB), 2.5 l Tn5mCP1n forward primer, 2.5 l Tn5mCBar reverse primer, and 0.5 l 100X SYBER Green I (In vitrogen). The f ollowing PCR program was implemented: 1 cycle of 72 • C for 5 min, 1 cycle of 98 • C 30 s, and 10 cycles of 98 • C for 10 s, 63 • C for 30 s, 72 • C for 30 s. Following two-sided size selection with 0.5 × and 1.4 × of Agencourt AMPure XP magnetic beads, library concentration and fra gment distrib ution were checked via the 2200 TapeStation System with the High Sensitivity D1000 ScreenTape / Reagents (Agilent Technologies).
Libraries were sequenced at the DKFZ Genomics and Proteomics Core Facility using the Illumina NextSeq 550 P air ed-End 75 bp (GEO accession: GSE218533). Sequencing reads were processed using the CWL-based ATAC-Seq wor kflow availab le at https: // github.com / CompEpigen / ATACseq workflows ( 37 , 38 ). Peak calling on individual samples was performed with MACS2 with parameters -nomodel -keep dup all -broad -gsize 2736124973 -qvalue 0.05. We followed the DiffBind protocol to obtain a consensus read count matrix from MACS2 peak sets ( 39 ). The ATAC-seq peaks were filtered using the ENCODE blacklist ( 40 ) and only the peaks present at least in any two samples were included in the analysis. Peaks were annotated to nearest gene using the 'annotatePeak' function in 'ChIPseeker' package in R. To identify differentially accessible regions between different cell lines we used the raw read count matrix in DESeq2 ( 41 ). For this, only genomic regions that were annotated as promoter regions based on the annotatePeak calls were considered. If se v eral promoters were mapped to the same gene, a mean of raw reads over those promoter regions was calculated. To obtain normalized ATAC counts for the comparison of peak accessibility at the promoters of the genes in heterogeneous pathways with random regions, we used library size normalization in DiffBind. If se v eral promoters were mapped to the same gene, a mean of normalized reads over those promoter regions was calculated.

P an-sar coma clustering of patient-specific regulatory networks
We set out to investigate the regulatory processes that dri v e heterogeneity in soft-tissue sarcomas. We started by modeling genome-wide, patient-specific gene regulatory networks for 206 TCGA soft-tissue sarcoma patients using two computational algorithms, PANDA and LIONESS (Figure 1 ). These patient-specific networks include information on likelihoods of regulatory interactions (r epr esented as edge weights) between 623 TFs and 17,899 target genes. To explore and visualize patient heterogeneity based on their regulatory landscapes, we first calculated gene targeting scores in these networks (see Methods), and then used Unif orm Manif old Appr oximation and Pr ojection (UMAP) for visualization. To determine whether regulatory profiles cluster differently than expression data, we also performed UMAP on the expression data ( Figure 2 ).
In both the regulatory networks and expression data, the majority of leiomyosar coma samples, r epr esented by uterine (ULMS) and soft-tissue leiomyosarcoma (STLMS), clustered separately from other sarcoma subtypes, with a more distinct separation observed in the gene expression profiles (Figure 2 ). These two types of leiomyosarcoma both arise from smooth muscle cells, howe v er, they arise from different tissues-of-origin. When only including leiomyosarcoma samples in the UMAP visualization (Supplementary Figure S2A, B), it is clear that co-localization of uterine and soft-tissue leiomyosarcomas is different between the two UMAP embeddings --while ULMS samples separated from STLMS in the expression data (Supplementary Figure S2A), clustering of leiomyosarcoma based on gene regulatory networks did not separate these subtypes (Supplementary Figure S2B). By comparing the local neighbourhood structures between the UMAPs we can see that two UMAPs have low local similarity, meaning that samples have different neighbors in the two embeddings (Supplementary Figure S2C). This indicates that interpatient heterogeneity in leiomyosarcoma tumors based on pa tient-specific regula tory networks is dif ferent from the heterogeneity observed in gene expression data, and thus may lead to a different stratification of patients. Gene r egulatory networks captur e underlying r egula tory dif ferences between samples, such as differential TF-gene targeting. Ther efor e, comparati v e analysis of gene regulatory networks can uncover differences in regulatory relationships that may dri v e inter-patient heterogeneity and identify key TF-gene interactions contributing to these differences. The potential usefulness of such networkbased study of inter-patient heterogeneity is the stratification of patients and identification of novel molecular subtypes.

In-depth analysis of gene regulatory heterogeneity in leiom yosar coma with PORCUPINE
Because the heterogeneity in the gene regulatory landscape of leiomyosarcoma was different from that observed in the expression data, we performed an in-depth analysis of this heterogeneity. To facilitate this, we de v eloped a ne w computa tional tool, PORCUPINE, tha t can be applied to patient-specific gene regulatory networks to identify biological pa thways tha t captur e r egulatory heterogeneity in a patient population (Figure 3 ). PORCUPINE examines regulatory co-variability of edge weights across a cohort of patient-specific networks in a pre-defined set of pathways, e.g. pathways from published r esour ces such as the MsigDB database ( 42 , 43 ). The method performs PCA on all estima ted regula tory interactions connected to genes from a specific pathway. It then compares the variance captured by the first principal component in the pathway to the amount of variance that would be expected by chance. This process is repeated for each pathway. Significant pathways can then be selected based on user-defined thresholds of adjusted Pvalue and effect size.
We applied PORCUPINE to the 80 patient-specific leiomyosar coma gene r egulatory networks from T CGA, using 1,455 Reactome pathways from MSigDB (see Methods). This identified 72 significant pathways (adjusted ...

TCGA-LMS GRN 2 TCGA-LMS GRN 80
...  P -v alue < 0.01, v ariance explained ≥10%, and effect size ≥2). We validated these results in an independent set of patient-specific networks modeled on 37 leiomyosarcoma samples from DKFZ. In the validation dataset, we identified 91 pathways, of which 37 were also identified in the networks modeled on TCGA. This overlap of 37 pathways is higher than expected by chance, with P -value < 9.891e-29 based on a hypergeometric test. The pathway's effect sizes also correlated with a Pearson correlation coefficient of 0.53. This indicates that PORCUPINE's results are robust and highly repr oducible acr oss networks modeled on independent datasets. The 37 pathways that were detected in both datasets are visualized in Figure 4 , with corresponding effect sizes. Notably, the significant pathways varied in size, indica ting tha t PORCUPINE analysis is not biased towards pathways of smaller or larger size (see also Supplementary Table S1).

Regulatory heterogeneity in pathways with known and new roles in leiom yosar coma
The two most significant pathways that were identified in both datasets are 'Inhibition of replication initiation of damaged DNA by RB1 / E2F1' and 'E2F mediated regulation of DNA replication,' containing 13 and 22 genes, respecti v ely. A closer e xamina tion of the genes in these pa thways shows that all 13 genes in the first pathway are also part of the second pathway. PORCUPINE provides evidence of a coordinated change in the regulation of multiple genes in these pathways that is not directly captured by expression data (Supplementary Figure S3). These pathways ar e leiomyosar coma-r ele vant, gi v en that leiomyosarcomas are characterized by a high frequency of alterations in tumor suppressor gene RB1 , which negati v ely regulates transcription factor E2F1 ( 18 ).
The 37 pathways can be further grouped into subcategories according to their cellular function (see Figure 4 ). Pathways with genes involved in cell cycle and signal transduction were the most frequent subcategories. Two pathways were associated with TP53 regulation, including 'TP53 regulates transcription of genes involved in G2 cell cycle arrest' and 'TP53 regulates transcription of cell cycle genes.' Among signal transduction pathways, we found an overrepresenta tion of pa thwa ys in volv ed in fibrob last growth factor receptor (FGFR) signaling, including 'Negati v e regulation of FGFR2 signaling,' 'FGFRL1 modulation of FGFR1 signaling,' and 'ERKs are inactivated.' FGFRs are tyrosine kinase receptors that are involved in se v eral biological functions including regulation of cell gr owth, pr oliferation, survival, differentiation, and angiogenesis. Aberrant FGFR signaling has been shown to be associated with several human cancers and thus FGFRs are attracti v e druggable targets ( 44 ). To our knowledge, among members of the FGFR famil y, onl y the inhibition of FGFR1 has been investigated in a patient with metastatic leiomyosarcoma, which showed clinical improvement ( 45 ). There is an ongoing clinical trial testing the selecti v e pan-FGFR inhibitor rogara tinib to trea t pa tients with advanced sarcoma with alterations in FGFR1-4 ( 46 ). Ov ervie w of PORCUPINE ( P CA to O btain R egulatory C ontributions U sing P athway-based I nterpretation of N etwork E stimates). PORCU-PINE applies the following steps: 1) TF-gene edge weight information is extracted from each individual gene regulatory network for all genes belonging to a certain pathway; 2) Principal Component Analysis is performed on the pa thway-associa ted TF-gene weight matrix. The variance explained by the first principal component is extracted; 3) The amount of variance explained by PC1 is compared to the expected amount of variance explained, which is obtained by a ppl ying PCA on edge weights connected to 1,000 randomly generated gene sets of the same size as the selected pa thway. Ef fect size is calculated. These steps are repeated for each pathway. P -values obtained from step 3 are then corrected for multiple testing with the Benjamini-Hochberg method.
Two pa thways associa ted with immune system function were identified --'CTLA-4 inhibitory signaling' and 'Defensins.' CTLA-4 is an immune checkpoint, and monocolonal antibodies such as ipilimumab and tremelimumab hav e been de v eloped to target CTLA-4. These CTLA-4 inhibitors have already been used in clinical studies for treatment of se v eral cancer types ( 47 ). The efficacy of imm unothera py with CTLA-4 inhibitors in soft-tissue sarcoma has only been evaluated in one study to-date, in which six patients with synovial sarcoma were treated with ipilimumab ( 48 ). To our knowledge, no clinical results testing the effect of anti-CTLA-4 in leiomyosarcoma are available or exist to-date.

Major genes and transcription factors contributing to leiom yosar coma heterogeneity
We next identified those regulatory interactions in each of the 37 pathways that contributed most to the regulatory heterogeneity we observed in leiomyosarcoma (see Methods). Across all pathways, genes including PPP2R1A, PPP2CB, TFDP2, CCNB1 , and RB1 wer e fr equently found among the top targets (Supplementary file 2). These genes ar e r ela ted to cell prolifera tion and growth. Noteworthy, PPP2R1A was among the top contributors in 13 out of 37 pathways and may ther efor e be a key player in driving leiomyosarcoma heterogeneity (see Figure 5 for its contribution to three selected pathways). It encodes for a subunit of protein phosphatase 2 (PP2), which plays a role in the negati v e control of cell growth and division. PP2A inactivation is a crucial step in malignant de v elopment ( 49 ). It was previously shown that PPP2R1A mutation is frequent in uterine cancers ( 50 ). Howe v er, we did not identify an association between the histological subtype of leiomyosarcoma and gene regulatory heterogeneity in pathways that had PPP2R1A among their major contributors. We also did not identify any significant association of patient heterogeneity scores with PPP2R1A mutation profiles, indicating tha t regula tory heterogeneity of PPP2R1A is not dri v en by soma tic muta tions in the gene itself.
In addition to reporting the top target genes, we identified top TFs contributing to regulatory heterogeneity in each pathway. TFs that coordinately regulated multiple targets are shown in Figure 5 C for the three main pathways discussed above and in Supplementary Figure S4 for all pathways. Some TFs had a limited number of targets that they regulate in a coordinated manner, such as in the pathway 'Inhibition of replica tion initia tion of damaged DNA by RB1 / E2F1,' where various TFs target a relati v ely low number of genes. Other TFs, such as E2F8 in 'CTLA4 inhibitory signaling,' were enriched for hetero geneousl y targeting a large number of genes ( Figure 5 C, see also Supplementary Figure S5, which indicates most genes of this pathway are coordinately targeted by E2F8). E2F8 and ZNF282 were the most frequent TFs that connected to a large number of targets across many of the identified pathways (see also Supplementary Figure S4).
The E2F family of TFs contains eight members that play central roles in many biological processes, including cell prolifera tion, dif ferentia tion, DNA repair, cell cycle, and apoptosis. Se v eral studies have shown that dysregulation of E2F8 is associated with oncogenesis and tumor progression in many cancers. For example, it was shown that expression of E2F8 is associated with tumor progression in breast  cancer ( 51 ), human hepatocellular carcinoma ( 52 ) and lung cancer ( 53 ). Howe v er, not much is known about the role and clinical significance of E2F8 in leiomyosaroma, nor in other sarcomas. The role of ZNF282 (Zinc finger protein 282) in human cancers , including sarcomas , is unknown. In a study by Yeo et al. , it was shown that ZNF282 ov ere xpression was associated with poor survival in esophageal squamous cell carcinoma, and depletion of ZNF282 inhibited cell cycle progression, migration, and invasion of cancer cells ( 54 ). Additionally, the authors provided evidence that ZNF282 functions as an E2F1 co-activator, highlighting a potential connection between this TF and E2F signaling.

Regulatory heterogeneity in leiom yosar coma is not associated with clinical features, somatic mutations or DNA methylation
We ne xt e xplored if the heterogeneity we observed in leiomyosarcoma gene regulatory networks is associated with known features that may influence patient heterogeneity, such as clinical features and genomic data.
To investigate whether the identified pathways were associa ted with clinicopa thological fea tures, we performed an association analysis of the pathway-based patient heterogeneity scores with clinical features available from the TCGA and DKFZ resources (Supplementary Figure S6). Ther e wer e no significant associations between the clinical features and the pathway-based patient heterogeneity scores on the first principal component (at FDR-adjusted P -value < 0.05). To determine whether any of the identified pathways were related to patient survival, we used the pathwaybased patient heterogeneity scores on the first principal component in Cox r egr ession models to predict patient outcome. We did not identify any significant associations with survival.
To evaluate if any of the identified pathways could classify patients with similar mutational profiles, we associated the first principal component in these pathways with pathway mutation scores. To do so, we downloaded and processed muta tion da ta obtained fr om leiomyosarcoma tumors fr om TCGA (available for 72 / 80 patients) as described in Kuijjer et al. ( 33 ). We performed a Kruskal-Wallis test to compare the pa thway-based pa tient heterogeneity scores on the first principal component in each of the 37 pathways between two groups, i.e. mutated compared to not mutated, for each muta ted pa thway. No significant dif fer ences wer e identified (FDR-adjusted P -value < 0.05), indicating that the separation of leiomyosarcoma patients identified with PORCU-PINE is independent of tumor mutation profiles. Thus, gene regulation may potentially be a new, mutation-independent mechanism driving patient heterogeneity.
To investigate if the patient heter ogeneity pr ofiles were associated with inter-individual differences in the tumor's methylation profiles, we performed correlation analysis of the pathway-based patient heterogeneity scores on PC1 with overall DNA methylation profiles of individual tumors. Ther e wer e no significant associations (FDRadjusted P -value < 0.05), indicating that regulatory heterogeneity in leiomyosarcoma is independent of methylation status.

Stratification of patients based on the identified pathways
PORCUPINE allows to identify patient subtypes based on gene regulatory networks for each of the significant pathways. For this, K -means clustering is applied to the pathwaybased patient heterogeneity scores on the first two principal components. Supplementary Figure S7 shows the identification of two leiomyosarcoma subtypes based on the top hetero geneousl y regula ted pa thway --'E2F media ted regulation of DNA replication.'

Regulatory heterogeneity of the identified pathways is not observed in healthy tissues
To explore if the 37 pathways we identified were cancerspecific, we assessed gene regulatory heterogeneity in healthy smooth muscle-deri v ed tissues, r epr esented by esophageal muscularis and uterus. In total, 283 esophageal muscularis and 90 uterus sample-specific gene regulatory networks, modeled with PANDA and LIONESS, were available from the GTEx project through the GRAND database ( 35 ). We used PORCUPINE to characterize regulatory heterogeneity in this dataset. In total, 27 and 25 significant pathways were identified in healthy smoothmuscle-deri v ed tissues uterus and esophageal muscularis, respecti v ely. Among the 37 pathways identified to dri v e leiomyosarcoma hetero geneity, onl y one pathway, i.e. 'Ga p junction degradation' was significant in these healthy tissues, indica ting tha t 36 / 37 pa thways we identified are leiomyosarcoma-specific and that gene regulatory heterogeneity in these pathways likely de v elops during sarcomagenesis.

Regulatory heterogeneity associates with chromatin state
Finally, we investigated whether network heterogeneity corresponds to chromatin accessibility. To do so, we profiled RNA-seq and ATAC-seq for four leiomyosarcoma cell lines. To translate our findings on the inter-patient heterogeneity in leiomysarcoma to the cell lines, we constructed cell line specific gene regulatory networks based on the RNAseq data, and placed these networks on the regulatory map of leiomyosarcoma patients. Cell lines clustered among the DKFZ-LMS patient specific networks (Supplementary Figure S8), and we could confirm 29 / 37 pathways when we included these cell lines in our analyses. We next clustered ATAC-seq profiles of the four cell lines (three replicates for each cell line, see Supplementary Figure S9). The cell lines had distinct chromatin profiles with SK-LMS-1 and MES-SA clustering separately from SK-UT-1 and SK-UT1-B --two cell lines that are deri v ed from the same donor. We then assessed whether promoters of genes from the significant pathways detected by PORCUPINE are located within open chromatin regions. To do so, we compared peak accessibility at the promoters of these genes to that at promoters of randomly selected genes. We observed a significant enrichment in open chromatin regions for the hetero geneousl y regulated genes (Figure 6 A). In addition, we compared expression of these genes to randomly selected gene sets, and found they are also highly expressed (Figure 6 B). This suggests that genes that are located in open chromatin regions are more likely to be regulated by different sets of TFs, which could have implications for network-based biomarker detection or the de v elopment of subtype-specific targets for treatment.
Finall y, we evaluated w hether genes from the top hetero geneousl y regulated pathway 'E2F mediated regulation of DNA r eplication' wer e also over-r epr esented in differ entially accessible regions between leiomyosarcoma cell lines. To do so, we called differentially accessible regions in pairwise comparisons between the four cell lines (six comparisons in total). The promoter of PPP2R1A , the gene we found to be most enriched for heterogeneous regulation in the two patient cohorts, was differentially accessible in all pairwise comparisons between cell lines, except between SK-UT-1 and SK-UT1-B (see Supplementary Table S2).
Howe v er, as these two cell lines are deri v ed from the same donor, they are expected to have comparable regulatory profiles. This indica tes tha t the dif ferential heterogeneity observed in the patient-specific regulatory networks are not only associated with open chromatin states in general, but also with more subtle differences in chromatin landscapes between individual tumors.

Discussion
In this work, we hypothesized that classification of softtissue sarcoma patients on the basis of gene regulatory networks has the potential to provide additional, novel information to stratify patients into clinically meaningful subgroups, to point to potential new targets for treatment, and to identify ne w biomar kers to guide selecting patients most likely to benefit from a specific treatment.
To this end, we de v eloped PORCUPINE, a novel computational approach to map heterogeneity of gene regulation across a patient population. We applied the method to model heterogeneity of gene regulation in leiomyosarcoma, which we found to present a high le v el of heterogeneity in a pan-sarcoma network anal ysis. A ppl ying PORCU-PINE to two independent leiomyosarcoma cohorts identified 37 pathways that robustl y ca pture gene regulatory heterogeneity in the disease. Among the detected pathways, we identified pa thways tha t could r epr esent potential targets for treatment of subgroups of leiomyosarcoma patients, including RB1 / E2F1 signaling, pathwa ys in volved in FGFR signaling, and CTLA4 inhibitory signaling. While these pathways have been described as potential targets for treament of sarcomas, not all patients may respond to such approaches , as , for example , was recently shown for treatment with a CTLA4 inhibitor in synovial sarcoma ( 48 ). Stratifying patients based on the regulatory profiles of these pathways could potentially help identify subgroups of patients that are likely to respond to trea tments tha t act on these pathways.
PORCUPINE highlighted genes and TFs that are enriched in driving heterogeneity among leiomyosarcoma patients, including RB1 and PPP2R1A as target genes, as well as the TFs E2F8 and ZNF282, which could potentially be inhibited ( 55 ). Through gene regulatory network modeling and ATAC-seq profiling in leiomyosarcoma cell lines, we found that promoters of the most hetero geneousl y regulated genes in leiomyosarcoma are enriched for open chromatin states. This suggests that genes in open chromatin states may be more prone to recei v e differential binding by TFs, which could have implications for the detection of regulatory biomarkers or subtype-specific targets for treatment.
We performed our study on four leiomyosarcoma cell lines that are also r epr esented in e xtensi v e cell profiling and functional genomics initiati v es such as DepMap from Broad Institute. While we could capture heterogeneous regulation in most of the identified pathways, the small number of cell lines may likely not fully r epr esent the landscape of heterogeneous gene regulation we observed in the patient cohorts, which is a limitation of our study. Howe v er, we could still identify significant dif ferential chroma tin sta tes for the top hetero geneousl y regulated gene in the patient popula tion, PPP2R1A , indica ting tha t our network models may potentially also capture subtle differences in chromatin states in a patient population.
We de v eloped PORCUPINE as a user-friendly R package that can be applied to single-sample networks. While similar approaches have previously been successfully applied to study heterogeneity in cancer using gene expression profiles ( 56 ), our approach differs from these methods as we specifically designed it to analyze large-scale, genome-wide gene regulatory networks. Of note, while we used PORCUPINE on networks modeled with PANDA and LIONESS, the tool is not limited to these specific methodolo gies, and could potentiall y also be used to analyze (bipartite) networks modeled with other single-sample approaches. For example, it can potentially be applied to gene regulatory networks from single cell RNA-seq data modeled with SCORPION ( 57 ). Of course, w hen a ppl ying PORCUPINE, one should consider cohort sample size as well as the use of an independent validation dataset, as we showed here by including an independent leiomyosarcoma dataset, which are both important to include to detect relevant and robust pathways. Additionally, it is important to note that, while the use of a large set of randomized pathways is beneficial, it comes with the disadvantage of an increase in computational load.
Genome-wide gene regulatory networks r epr esent highdimensional data. Usually, network summary statistics, such as gene targeting scores, closeness centrality, or betweenness centrality, are calculated prior to any further analysis to reduce the dimensions of large-scale networks. Then, to identify heterogeneity across a cohort, unsupervised clustering approaches are widely used ( 58 ). The advantage of PORCUPINE is that it can be directl y a pplied to high-dimensional networks, as it uses as input the network's edge weights instead of a summary statistic. Moreover, as it does this per individual biological pathway, the output is not just a collection of significant differential edges that need to be further analyzed, but rather a list of differentially regulated pathways that are easy to interpret. Additionally, the method can capture significant aspects of heterogeneity among individuals in situations when no clear population structure with well defined clusters can be revealed. PORCUPINE estima tes pa thway-based pa tient heterogeneity scores that can facilitate the identification of either continuous gradients or discrete gene regulatory subtypes and that can be further used in association analyses with clinical covariates, or in survival analyses, as we have shown in this work.
In summary, with PORCUPINE, we uncovered patterns of inter-patient heterogeneity at the le v el of transcriptional regulation in tumors and cell models, and identified genes and pathways that may r epr esent therapeutic entry points in leiomyosarcoma. Our approach thereby provides one of the first steps towards implementing network-informed personalized medicine in soft-tissue sarcomas.

DA T A A V AILABILITY
The input data to reconstruct individual patient networks and the set of reconstructed networks used in this study are available on Zenodo ( https://doi.org/10.5281/zenodo. 8105729 ). Description of files is provided in Supplementary File 3. PORCUPINE is de v eloped as an R package and it is available as open-source code on GitHub ( https://github. com/kuijjerlab/PORCUPINE ) and on Zenodo ( https://doi. org/10.5281/zenodo.8105729 ).

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Cancer Online.